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Community structure, which refers to the presence of densely connected groups within 
a larger network, is a common feature of several real-world networks from a variety of 
domains such as the human brain, social networks of hunter-gatherers and business 
organizations, and the World Wide Web (Porter et al., 2009). Using a community detection 
technique known as the Louvain optimization method, 17 communities were extracted 
from the giant component of the phonological network described in Vitevitch (2008). 
Additional analyses comparing the lexical and phonological characteristics of words in 
these communities against words in randomly generated communities revealed several 
novel discoveries. Larger communities tend to consist of short, frequent words of high 
degree and low age of acquisition ratings, and smaller communities tend to consist 
of longer, less frequent words of low degree and high age of acquisition ratings. Real 
communities also contained fewer different phonological segments compared to random 
communities, although the number of occurrences of phonological segments found in 
real communities was much higher than that of the same phonological segments in 
random communities. Interestingly, the observation that relatively few biphones occur 
very frequently and a large number of biphones occur rarely within communities mirrors 
the pattern of the overall frequency of words in a language (Zipf, 1935). The present 
findings have important implications for understanding the dynamics of activation spread 
among words in the phonological network that are relevant to lexical processing, as well 
as understanding the mechanisms that underlie language acquisition and the evolution of 
language. 
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INTRODUCTION 

In the past decade or so, the application of graph-theoretic 
methods to model a variety of complex, large-scale real-world 
phenomena has burgeoned. Graph-theoretic approaches refer 
to the techniques developed by mathematicians to characterize 
and describe the topology or structure of a network (Watts and 
Strogatz, 1998; Watts, 2004). Researchers from a multitude of dis- 
ciplines have applied these techniques to investigate large-scale 
networks such as the Internet (Yook et al., 2002), scientific collab- 
orations (Barabasi et al, 2002), the human brain (Bullmore and 
Sporns, 2009) and the mental lexicon (Steyvers and Tenenbaum, 
2005; Vitevitch, 2008). 

The tools of network science have been applied to study lan- 
guage by creating semantic networks constructed from either 
word association data or co-occurrence statistics, and networks of 
phonological word-forms. Applying graph theoretic methods to 
analyze language networks is a fast growing and particularly pro- 
ductive area of research. Previous work with respect to semantic 
networks have important implications for the cognitive mecha- 
nisms underlying language processing because they suggest that 
these mechanisms exploit network structure to facilitate the pro- 
cessing of language and retrieval of semantic knowledge. For 
instance, researchers have shown that the network structure of 
word associations is a superior predictor of human responses 
on a fluency task, which indicate that the search for a relevant 



response to a given cue is dependent on the link structure of the 
semantic network, as well as the relative importance of these links 
within memory (Griffiths et al., 2007). The network structure of 
word associations was also a good predictor of semantic simi- 
larity between pairs of seemingly unrelated words, which might 
indicate a common ontological organization of words and con- 
cepts across people (De Deyne et al., 2012). Work by Hills et al. 
(2009, 2010) on early semantic networks of children has also con- 
tributed considerable insight into the longitudinal development 
of the mental lexicon, especially in terms of the roles of differ- 
ent network growth mechanisms in language acquisition. Their 
results also showed that the network of words within a child's 
language learning environment is an important predictor of the 
words that children learn first, which have important implications 
for language acquisition. 

The tools of network science have also been used to model 
the organization of phonological word-forms in the mental lex- 
icon (Vitevitch, 2008). In a phonological network, nodes repre- 
sent phonological word forms, and links connect words that are 
phonologically similar to each other. Two words are said to be 
phonologically similar or "phonological neighbors" of each other 
if the first word can be transformed into the second word by 
the substitution, addition or deletion of one phoneme (Luce and 
Pisoni, 1998). The phonological network examined in Vitevitch 
(2008) displayed the properties of a small-world network; that is, 
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short average path length and high average clustering coefficient, 
features of network topology that have been commonly observed 
in other real world networks (Watts and Strogatz, 1998). 

With regards to the phonological network, the clustering coef- 
ficient of a word, a network science measure that describes the 
local structure of a node, has been shown to influence lexical pro- 
cessing of spoken words (Chan and Vitevitch, 2009, 2010), as well 
as long and short-term memory processes (Vitevitch et al., 2012). 
These results are theoretically important because they place addi- 
tional constraints on current models of spoken word recognition, 
which are unable to accommodate these findings as they do not 
explicitly take into account the role of network structure on lexical 
processing. 

Taken together, prior work applying the theory and methods of 
network science to the study of language has been invaluable, as 
these studies have provided evidence for the psychological reality 
of the network structure of semantic and phonological networks, 
and showed that the nature of network structure has measurable 
influences on lexical processing and language acquisition. 

The recent movement toward using network science to 
describe the overall structure of the lexicon contrasts with the tra- 
ditional approach of psycholinguistic research which has typically 
focused on the lexical characteristics of individual words, such as 
word frequency and neighborhood density. Previous work from 
the network science approach has shown that language networks 
and other complex networks share several macro-level features, 
such as being "small-world" (i.e., short average path lengths and 
large average clustering coefficients; Steyvers and Tenenbaum, 
2005; Vitevitch, 2008), and possessing a degree distribution which 
approximates a power law (Steyvers and Tenenbaum, 2005; Hills 
et al., 2009; but not for the phonological network, where the 
degree distribution is better fit by a truncated power law, see 
Arbesman et al, 2010b). In contrast, previous psycholinguis- 
tic research largely concentrated on investigating the influence 
of micro-level lexical variables (i.e., characteristics of individ- 
ual words) on spoken word recognition and production (e.g., 
Savin, 1963; Broadbent, 1967; Taft and Hambly, 1986; Luce and 
Pisoni, 1998; Vitevitch and Luce, 1998, 1999; Garlock et al, 
2001). 

There exist theoretically important reasons to investigate 
the meso-level of the phonological network. This paper rep- 
resents a first step in this direction by extracting and ana- 
lyzing the community structure of the phonological network 
of words. In the following paragraphs I provide examples of 
how community detection techniques have been used to study 
other complex networks, and briefly show how these have 
enhanced our understanding of the structure and dynamics 
of networks. Then, to motivate the present work, predictions 
with respect to the community structure of phonological word 
forms and potential theoretical significance of applying com- 
munity detection to the phonological network will be briefly 
discussed. 

Community structure refers to the presence of several smaller 
groups of nodes contained in a larger network. These smaller 
groups form such that there are many connections among nodes 
within a group, but few connections between nodes in dif- 
ferent groups (Newman and Girvan, 2004; Newman, 2006). 



This phenomenon has caught the attention of network scien- 
tists because it has been generally observed that communities 
are a ubiquitous feature of real-world networks in a variety of 
domains, such as the structure of a human brain (Wu et al, 201 1 ), 
social networks of hunter-gatherers, business organizations and 
Facebook friends (Porter et al., 2009), and the World Wide Web 
(Newman, 2004). The observation that real- world networks tend 
to divide naturally into smaller networks has led to the general 
hypothesis that this natural division reflects the presence of a hier- 
archical structure, where larger communities consist of smaller 
communities in an iterative pattern (Ravasz and Barabasi, 2003), 
or the encapsulation of functions or local interactions in a com- 
plex system (Girvan and Newman, 2002; Newman and Girvan, 
2004). For instance, community structure may indicate the pres- 
ence of protein clusters with similar biological functions in a 
protein-interaction network (Ravasz et al., 2002) or reflect the 
underlying social organization and hierarchy of societies (Porter 
et al., 2009). 

The vast majority of the literature has focused on using the 
tools of network science to delineate the network topology of a 
complex system. Ultimately, however, the goal is to understand 
how network structure influences the dynamics and functioning 
of a network. Community detection analyses have the potential 
to reveal details of network structure that may not be observ- 
able at the coarse, top-most level of analysis, nor by examining 
the individual nodes that comprise the system (Lancichinetti 
et al, 2010; Onnela et al., 2012). In a study investigating the 
spread of disease in a network with community structure reflect- 
ing the social make-up of a society, Kitchovitch and Lio (2011) 
showed that disease tends to spread more efficiently within the 
community than across communities, revealing a more detailed 
understanding of the spread of a disease that would not be possi- 
ble if only the dynamics of the entire network or of individuals 
in the system were analyzed. Similarly, uncovering community 
structure in the phonological network can enhance our under- 
standing of how dynamic processes underlying word recognition 
and production — specifically, the spreading of activation among 
word nodes — are affected by the community structure of the 
network. 

In the phonological network, undirected and unweighted links 
are placed between words that are phonologically similar to each 
other. Therefore, phonologically similar words that share com- 
mon phonological segments tend to cluster together and are likely 
to form a community within the network. The presence of com- 
munities in the phonological network could reflect the grouping 
of these phonological segments in English. As evidenced from 
high clustering coefficients, there exist naturally occurring clus- 
ters within the phonological network in Vitevitch (2008) because 
the distributions of phonemes that make up any single word 
are not random. Community detection can reveal the presence 
of these clusters which may enhance our understanding of the 
underlying phonological structure of language. 

Phonemes are widely recognized as the smallest units of a 
language [Sapir, 1933; but see Savin and Bever (1970) for evi- 
dence against the perceptual reality of phonemes], and words 
are formed by stringing together sequences of phonemes. A 
long-standing research topic in psycholinguistics deals with 
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the phonological structure of words in a language, and early 
researchers have studied how the distribution of phonological 
segments among various word types influences lexical process- 
ing of these words (Greenberg and Jenkins, 1964; Landauer and 
Streeter, 1973). More recently, it has been shown that certain 
phonological characteristics of words, such as phonotactic prob- 
ability (Vitevitch and Luce, 1998, 1999) influences the speed and 
accuracy of lexical retrieval. Phonological segments (in the sim- 
plest case, a pair of phonemes) that occur more frequently than 
other phonological segments are said to be of high phonotac- 
tic probability. Nonwords containing phonological segments of 
high phonotactic probability are recognized more quickly than 
nonwords containing phonological segments of low phonotac- 
tic probability (Vitevitch and Luce, 1998, 1999; Luce and Large, 
2001). 

As mentioned earlier, real-world networks tend to naturally 
divide into smaller sub-graphs and display a hierarchical struc- 
ture that can be observed at the mesoscopic level. Phonological 
word forms in the mental lexicon can be said to display such 
a hierarchical structure as well. For example, the presence of 
short words (such as "cat") embedded in longer words (such as 
"catalog" or "concatenate"), word clusters that share common 
onsets (e.g., "cat," "catalog" and "caterpillar") and words that 
share common rimes (e.g., "cat," "bat," and "rat") have been well- 
documented and investigated by psycholinguists (e.g., Marslen- 
Wilson, 1987; McQueen, 1996; Norris et al, 2002; McQueen 
and Sereno, 2005). In particular, Marslen- Wilson's cohort the- 
ory (1987) posits that recognition of a word occurs when the 
phonological sequence of that word begins to diverge from the 
phonological sequence of other words sharing the same ini- 
tial phonological sequence. In this theory, cohorts consist of 
words that share a common onset and become smaller as more 
phonological information becomes available over time (Marslen- 
Wilson, 1987); this is analogous to a hierarchy where a large group 
of words can be subdivided into smaller groups depending on 
the degree of phonological overlap (from the initial phoneme) 
between words. On the other hand, there is also evidence show- 
ing that word recognition is facilitated when participants are 
primed with words that share the same rime, and it has been 
argued that the phonological saliency of a group of words with the 
same rime prompted a biased processing strategy among partici- 
pants (Norris et al., 2002; McQueen and Sereno, 2005). Therefore, 
community detection methods can reveal community structure 
that reflects grouping of phonological word forms by cohort 
or by rime (or even potentially reflect both kinds of grouping) 
which may afford deeper insights into how the overall phonolog- 
ical structure of words influences and facilitates lexical retrieval 
processes. 

Examining the community structure of the phonological net- 
work may also have important implications for various aspects of 
psycholinguistics and language sciences. Here I briefly speculate 
on how community structure in the phonological network may 
enhance our understanding of lexical processes. Despite the fact 
that the average adult mental lexicon consists of 30,000-80,000 
words (Aitchison, 2012), people are able to recognize and pro- 
duce words rapidly and efficiently. This ability to retrieve word 
forms efficiently from a relatively dense or highly clustered large 



network of words strongly suggests that lexical retrieval mecha- 
nisms may exploit the community structure of the phonological 
network to facilitate rapid and accurate word recognition and 
production. 

Community detection could also reveal how network struc- 
ture at differing levels of the network influences lexical processing 
in distinctive ways. The finding that high probability segments 
facilitates lexical processing is seemingly at odds with the neigh- 
borhood density effect observed for spoken words, where words 
with several phonological neighbors are in fact less accurately and 
more slowly recognized compared to words with fewer neighbors 
(Luce and Pisoni, 1998; Goh et al, 2009). Words belonging to 
dense neighborhoods by definition also contain high probability 
segments. Phonological similarity appears to be simultaneously 
implicated in the facilitatory effects of probabilistic phonotactics 
and inhibitory effects of neighborhood density in spoken word 
recognition. 

Investigating the structure of the phonological network at var- 
ious levels of the network could help us understand the opposing 
effects of phonotactics and density on spoken word recognition. 
Neighborhood density reflects a micro-level measure of network 
structure, as it is simply the degree of a node. On the other hand, 
community structure measures network structure at the meso- 
level because it assesses the connectivity of words beyond that 
of a word's local neighborhood. It is possible that phonotactic 
effects on processing emerge as a consequence of the community 
structure of the phonological network. Therefore, phonotactic 
and neighborhood effects may not be entirely contradictory if 
one considers the connectivity of phonological word forms at 
various levels of a network. This approach is somewhat analo- 
gous to the adaptive resonance framework proposed by Vitevitch 
and Luce (1999), in order to account for their finding that facil- 
itatory effects of phonotactic probability were observed when 
processing nonwords and competitive effects of neighborhood 
density were observed when processing words. This framework 
consisted of sublexical and lexical types of representations which 
have dissociable and distinct effects on lexical processing, and 
arise depending on the nature of the processing task. 

Previously, researchers such as Landauer and Streeter (1973) 
and others (Frauenfelder et al., 1993; Schiller et al., 1996; Kessler 
and Treiman, 1997) studied the distributional properties of 
phonological segments in language using straightforward metrics 
(such as frequency counts of individual phonemes and biphones). 
These previous approaches, however, were limited by the com- 
putational power and technology available at the time. Now, 
the tools of network science and community detection tech- 
niques can be used to answer intriguing questions about the 
underlying phonological structure of a language. Current met- 
rics of phonotactic probability have focused on segment and 
biphone co-occurrence probabilities (e.g., Vitevitch and Luce, 
2004). However, Auer and Luce (2005) noted that there is a 
need to develop metrics of phonotactic probability to detect 
larger phonological sequences (i.e., that are longer than a pair of 
phonemes) and assess their influence on speech perception and 
production. Investigating the phonological segments of words 
that belong to specific communities could allow us to extract 
longer phonological sequences that frequently co-occur among 
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words and ultimately determine if listeners are sensitive to these 
larger segments in speech processing and whether the processing 
of words which contain these segments is facilitated. 

To recapitulate, the aim of the present paper is to uncover 
community structure of the phonological network described in 
Vitevitch (2008). To this end, a common community detec- 
tion technique known as the Louvain optimization method was 
applied to the giant component of the phonological network. For 
comparison of the mean lexical characteristics and biphone dis- 
tributions of the observed communities, the same words from 
the giant component were randomly assigned to communities 
(of sizes comparable to the observed communities) to provide a 
"baseline" for the measures of interest. The lexical characteristics 
analyzed in this paper include word length, subjective familiarity, 
word frequency, neighborhood density, neighborhood frequency, 
positional and biphone probability, and age of acquisition. These 
characteristics were chosen because they are known to influ- 
ence the speed and accuracy of lexical processing in a variety 
of psycholinguistic experimental paradigms such as lexical deci- 
sion, perceptual identification and word shadowing (Savin, 1963; 
Broadbent, 1967; Taft and Hambly, 1986; Luce and Pisoni, 1998; 
Turner et al., 1998; Vitevitch and Luce, 1998, 1999; Garlock et al., 
2001; Ghyselinck et al, 2004; Goh et al, 2009). Past work inves- 
tigating the statistical properties of words has also focused on 
comparing word length, frequency and neighborhood density 
(e.g., Zipf, 1935; Frauenfelder et al., 1993). In order to relate 
the present analyses back to previous work, the same variables 
are also investigated here. As words belonging to the same com- 
munity may also share similar phonological properties, mean 
positional and biphone probabilities are also analyzed because 
these variables represent commonly used measures of the phone- 
mic properties of words, and have been shown to influence lexical 
processing as well (Vitevitch and Luce, 1998; Vitevitch et al., 
1999). 

With respect to the phonological properties of words in com- 
munities, it follows that words belonging to the same community 
should share similar phonological characteristics as the phono- 
logical network was constructed based on phonological similarity. 
However, it is not entirely obvious if words belonging to the 
same community will also share similar lexical characteristics. 
Furthermore, the phonological and lexical properties of words in 
each community may depend on the size of the community. If 
one conceptualizes the network as containing a self-similar struc- 
ture where communities consist of smaller communities (which 
consist of even smaller communities), the largest community 
may resemble the "giant component" of the phonological net- 
work whereas smaller communities resemble lexical islands. The 
phonological network in Vitevitch (2008) consisted of a giant 
component of 6,508 words, several lexical islands (small net- 
works of words that are not connected to the giant component) 
and hermits (individual words that are not connected to any 
other words). The giant component consisted of words which 
were shorter in length, of higher frequency and higher neigh- 
borhood density (i.e., degree) than words in the lexical islands. 
Analogously, one might predict that larger communities in gen- 
eral consist of shorter words of higher frequency and higher 
density than words from smaller communities. 



MATERIAL AND METHODS 
THE PHONOLOGICAL NETWORK 

The phonological network in Vitevitch (2008) was constructed 
from approximately 20,000 words obtained from the Hoosier 
Mental Lexicon (Nusbaum et al., 1984). In this network, 
each node corresponded to a word's phonological transcription 
obtained from the Merriam- Webster Pocket Dictionary. An undi- 
rected and unweighted link (or edge) was added between two 
nodes if the two words were phonologically similar. 

Phonological similarity was defined as the substitution, addi- 
tion or deletion of one phoneme at any position between 
two given words (Greenberg and Jenkins, 1964; Landauer and 
Streeter, 1973; Luce and Pisoni, 1998). This measure is commonly 
used in the literature to calculate the phonological neighbor- 
hoods of a given word, and has a long history in psycholin- 
guistics (Landauer and Streeter, 1973; Luce and Pisoni, 1998). 
Furthermore, this metric has been shown to be a psychologically 
valid method in assessing phonological similarity — when asked 
to produce a word that sounds similar to a given word, partici- 
pants tend to produce words that differ from the given word by 
one phoneme (Luce and Large, 2001). The word /kaet/ ("cat") 
would have a phonological neighborhood consisting of /baet/ 
("bat"), /skast/ ("scat") and /ast/ ("at"), among other words. In 
the phonological network, these words are also known as "phono- 
logical neighbors" of the word /kast/ and would be connected via 
undirected and unweighted links to the node representing /kast/. 

Vitevitch (2008) found that the network consisted of a giant 
component of 6,508 words, lexical islands (words that are con- 
nected to each other, but not to any other words in the large 
component), and lexical hermits (words that had no phonologi- 
cal neighbors, known as isolates in the network science literature). 
In the present analyses, I extracted the community structure of 
the large component of 6,508 words. Islands and hermits were 
excluded from the analyses because by definition, each island 
and hermit constitutes a "community" of its own, so commu- 
nity detection conducted on these words is unlikely to yield 
meaningful or interpretable results. 

COMMUNITY DETECTION 

Modularity, Q, measures the density of links inside communi- 
ties as compared to links between communities 1 (Newman, 2006; 
Fortunato, 2010), and is mathematically defined as 



Q 



-E 

2m 



•■i 



2m 



o(c,-, Cj) 



where Ajj represents the adjacency matrix of the weights of the 
edge between nodes i and fc; is equal to the sum of the weights 
of the edges attached to node i, c, is the community to which node 



'The definition of Q may be appear to be similar to clustering coefficient, 
C, but they describe very different concepts. C measures the extent to which 
neighbors of a node are also neighbors of each other, and is a micro-level 
measure because it is computed at the level of individual nodes, whereas Q 
is a meso-level measure that is used to indicate the robustness of commu- 
nity structure of a network. See Borge-Holthoefer and Arenas (2010) for a 
discussion on the distinction between micro- and meso-levels of a network. 
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i is assigned, q is the community to which node j is assigned, the 8 
function o(c,-, Cj) is 1 if c, = Cj and 0 otherwise, and m = j 
Since the present network has unweighted edges, Ay is simply 
reduced to a matrix with constants, and k; and h is equal to the 
number of edges attached to node i and node; respectively. 

Modularity is also used as a measure of the quality of parti- 
tions resulting from community detection methods (Fortunato, 
2010). Positive Q values that are close to the maximum value of 
1.0 indicate the presence of high quality communities, 2 where the 
density of links within communities is high relative to the density 
of links between communities. Q values of large real-world net- 
works such as the Internet and cellular phone networks, as well 
as smaller social networks such as Zachary's karate club, range 
from 0.42 to 0.78 (Blondel et al., 2008). Although some variabil- 
ity exists among various kinds of complex networks, the fact that 
all of these networks have positive Q values imply that the com- 
munity structure of these networks is very robust, which implies 
that the partitions delineating communities in the network are 
highly distinct. 

The Louvain method is a modularity optimization or "greedy" 
optimization approach. The algorithm consists of two phases that 
are repeated iteratively. In the first phase each node is assigned 
to one community such that there are as many communities as 
there are nodes. The gain in modularity is evaluated by remov- 
ing node i from its community and placing it in the communities 
of its neighbors and node i is placed in the community which 
yields the greatest gain in modularity. This is done for the rest of 
the nodes in the network. In the second phase, a new network is 
built where nodes are the communities found in the first phase. 
Both phases are iterated until the highest possible value of Q is 
obtained. Although the output of this algorithm varies depend- 
ing on the order in which the nodes are considered, Blondel et al. 
(2008) indicates that this does not have a significant influence on 
the quality of the partitions produced by the algorithm. 

There exists a host of other community detection methods 
such as centrality or edge betweenness based techniques and 
dynamic methods such as clique percolation (e.g., Derenyi et al., 
2005). However, what distinguishes the Louvain method from 
others lies in its simple yet intuitive algorithm, which is remi- 
niscent of the self-similar nature of complex networks (Blondel 
et al, 2008). The algorithm also integrates the idea of hierar- 
chy within a network, as communities of communities are built 
at each pass (Blondel et al, 2008). Related to this is the notion 
of resolution limit in modularity optimization approaches. The 
resolution limit refers to the observation that whether small 
communities can be successfully extracted using modularity opti- 
mization methods is dependent on the size of the network and the 
extent of interconnectedness of its communities (Fortunato and 
Barthelemy, 2007; Porter et al., 2009). To address this issue, the 
resolution parameter can be specified in order to extract commu- 
nities at a particular level of the network's community structure, 



2 The maximum value of 1.0 is achieved when the network consists of an 
infinite number of disjointed cliques (independent groups of nodes that do 
not connect to each other), and the mathematical proof can be found in 
Fortunato and Barthelemy (2007). Modularity values that are close to 1.0 
generally indicate that community structure is a robust feature of the network. 



such that not too many or too few communities are extracted 
from the network. For the phonological network, the Louvain 
algorithm was conducted 5 times at various resolutions, from 1.0 
to 5.0 in 1.0 increments. A f-test comparing the modularity val- 
ues at resolutions 1.0 (average Q = 0.675) and 2.0 (average Q = 
0.667) showed that these values did not statistically differ from 
each other, and were significantly higher than those obtained with 
3.0 or higher resolutions. A table summarizing Q values and num- 
ber of communities yielded by the algorithm at each resolution 
is included in the Supplementary Materials. This indicated that 
the quality of communities extracted using resolutions of 1.0 and 
2.0 were not only similar but also very high. In this case, a res- 
olution of 2.0 was used because it yielded a smaller number of 
communities at a slightly higher level of hierarchy (Lambiotte 
et al, 2008), in order to facilitate statistical analyses and the inter- 
pretation of results. Note that using the communities extracted 
using a resolution of 1.0 did not result in qualitative differences in 
the interpretation of the following analyses. 

Although there exists a wide variety of community detec- 
tion methods to determine the presence of community structure 
in networks [see Porter et al. (2009) or Fortunato (2010) for 
a review of these algorithms] , the Louvain method was chosen 
because of the high quality of communities detected using this 
method, as well as short computational times (Blondel et al., 
2008). Although the choice of the Louvain community detec- 
tion method is admittedly somewhat arbitrary, several researchers 
have noted that most community detection methods yield very 
similar results despite having different algorithms, differing on 
very nuanced details such as whether communities are allowed 
to overlap or not (Porter et al, 2009). Therefore, it is unlikely 
that the communities obtained using the Louvain method are the 
result of an artifact from using a particular community detection 
algorithm. The Louvain algorithm is readily available in Gephi 
(Bastian et al, 2009). 

RANDOM COMMUNITIES 

A baseline model of "random" communities was constructed in 
order to provide a point of comparison for the lexical charac- 
teristics of the communities extracted from the giant component 
by the detection algorithm. The random communities were gen- 
erated by randomly assigning words from the giant component 
to the same number of communities with the same sizes as those 
extracted using the Louvain method. This permits a meaningful 
and unbiased comparison of the "real" communities that were 
generated using the community detection algorithm and "ran- 
dom" communities that were obtained via arbitrarily grouping 
words into communities of the same sizes. This randomization 
procedure is commonly used to generate baseline communities 
in studies that have investigated community structure in other 
complex networks (e.g., Traud et al., 2008). 

RESULTS 

COMMUNITIES IN THE PHONOLOGICAL NETWORK 

Using a resolution of 2.0 the community detection algorithm 
found 17 communities, with a mean size of 382.82 (SD = 249.29) 
nodes per community. As shown in Table 1 below, the sizes 
of the 17 communities varied, ranging from 31 to 697 words. 
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Table 1 | Community sizes for 17 communities extracted from the 
phonological network. 



Community 


N 


1 


31 


2 


37 


3 


38 


4 


85 


5 


127 


6 


271 


7 


278 


8 


348 


9 


397 


1 n 

IU 


Ron 


11 


543 


12 


544 


13 


625 


14 


626 


15 


654 


16 


687 


17 


697 


MEAN 


382.82 


SD 


249.29 



Communities were relabeled such that community 1 represented 
the smallest community and community 17 represented the 
largest community. Modularity, Q, was 0.655, a moderately large 
positive value, which implies the presence of robust commu- 
nity structure in the phonological network. Notably, this value 
lies between the range of modularity values obtained for various 
networks as computed in Blondel et al. (2008). 

In order to assess if the observed community structure is 
indeed a genuine feature of the phonological network, two Erdos- 
Renyi (ER) graphs with 6,508 nodes were generated in Pajek 
(Batagelj and Mrvar, 1998). ER Graph A was generated with the 
same mean degree as that of the real network (29627 edges/6508 
nodes = 4.55). This produced a random graph with a mean 
degree of 4.535, but with only 14,757 edges, much fewer than 
the 29,627 edges in the real network. Therefore, a second graph, 
ER Graph B, was generated with a higher mean degree of 9.105, 
which produced a random graph with a mean degree of 9.198 
and 29,929 edges. Note that these two ER graphs were gener- 
ated because of the constraints involved in generating a random 
ER graph that had the same number of edges and same mean 
degree as that of the phonological network. This constraint is due 
to the fact that the degree distribution of language networks is 
skewed, whereas the degree distribution of an ER graph resembles 
a Poisson distribution (Erdos and Renyi, 1961; Newman, 2003). 
Therefore, two different graphs were generated. The Louvain 
community detection algorithm (using the same parameters used 
to detect communities in the phonological network) was applied 
to both ER graphs. ER Graph A yielded 78 communities with a 
modularity of 0.232 and ER Graph B yielded 2 communities with 
a modularity of 0.0. 

Most notably, the modularity values of the randomly gener- 
ated networks (Q = 0.232 and 0.0) were much smaller than that 



of the phonological network (Q = 0.655). This indicates that the 
communities extracted from these random graphs are of lower 
quality than the communities extracted from the phonological 
network. Recall that a large modularity value (close to 1) implies 
that the community structure not only exists within the network 
but also that the structure is highly robust (Blondel et al., 2008). 
The high modularity value of the phonological network relative 
to the random networks strongly suggests that the language net- 
work consists of tightly connected communities and therefore 
this community structure is worth investigating further. In the 
following section, statistical analyses will be conducted on the 
mean lexical characteristics and phonotactic properties of these 
communities. 

LEXICAL CHARACTERISTICS IN THE COMMUNITIES 

As a basis of comparison in the analyses of various lexical charac- 
teristics, the community membership of words in the giant com- 
ponent was randomized to form "random communities" (which 
should not be confused with the Erdos-Renyi "random networks" 
used in the previous section Communities in the Phonological 
Network). To distinguish the communities found in the giant 
component using the Louvain method, I will use the phrase "real 
communities." 

To investigate how the 17 real (and random) communi- 
ties might be distinguished from each other, 1-way between- 
group ANOVAs (with the 17 communities as the indepen- 
dent variable) were conducted to compare the mean lexical 
characteristics of words in the real and random communities. 
These lexical characteristics (i.e., the dependent variable in the 
ANOVAs) include word length, subjective familiarity, word fre- 
quency, neighborhood density, neighborhood frequency, posi- 
tional and biphone probability, and age of acquisition. Table 2 
summarizes the mean of all lexical variables for each of the 
17 real and random communities. Note that the communi- 
ties have been relabeled such that community 1 represents the 
smallest community and community 17 represents the largest 
community. 

A significant ANOVA indicates that the lexical characteris- 
tic that is being compared is significantly different across the 
17 communities in either the real or the random communities. 
The results of the 1-way ANOVAs are summarized in Table 3. 
Note that as a number of Levene's tests of homogeneity of 
variances were significant (see Table 3), the assumption of homo- 
geneity of variances is violated. Because ANOVA is not robust 
to heteroscedasticity of variances when group sizes are unequal 
(Maxwell and Delaney, 2004), alternative F-statistics using cor- 
rected degrees of freedom were calculated where relevant for the 
omnibus F-tests. 

If the F-omnibus test was significant, then post-hoc linear trend 
analyses and correlational analyses were conducted in order to 
gain additional insight into the relationships between community 
size and lexical characteristics. If the post-hoc linear trend analysis 
and correlation between the lexical characteristic and commu- 
nity size are significant, this implies that the magnitude of the 
mean lexical characteristics of each of the 17 communities varies 
depending on the size of the community. Figure 1 shows the 
relationship between these lexical characteristics and community 
size. 
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Table 3 | Summary of statistical analyses for real and random communities. 



Lexical characteristics 



F-test 



Linear contrast 



REAL COMMUNITIES 



Word length 


F (16, 


261) = 


35.12, p < 0.001 


F(1, 104) = 


100.00, p < 0.001 


Familiarity 


F (16, 


199) = 


3.28, p < 0.001 


F(1, 124) = 


16.39, p < 0.001 


Word frequency 


F (16, 


234) = 


12.73, p < 0.001 


F(1, 116) = 


46.80, p < 0.001 


Neighborhood density 


F (16, 


271) = 


166.87, p < 0.001 


F(1, 1666) = 


= 1412.58, p < 0.001 


Neighborhood frequency 


F (16, 


288) = 


39.34, p < 0.001 


F(1, 113) = 


158.59, p < 0.001 


Positional probability 


F (16, 


183) = 


53.25, p < 0.001 


F(1, 84.4) = 


= 11.03, p < 0.01 


Biphone probability 


F (16, 


259) = 


101.72, p < 0.001 


F(1, 121) = 


116.48, p < 0.001 


Age of acquisition 


F (16, 


214) = 


9.73, p < 0.001 


F(1, 91.3) = 


= 49.58, p < 0.001 




Word length 


F (16, 


6490) = 


= 0.59, p = 0.90 






Familiarity 


F (16, 


6490) = 


= 0.731, p= 0.76 






Word frequency 


F (16, 


6490) = 


= 1.15, p = 0.30 






Neighborhood density 


F (16, 


252) = 


1.77, p < 0.05 


F(1, 145) = 


2.50, p = 0.12 


Neighborhood frequency 


F (16, 


6490) = 


= 1.20, p = 0.26 






Positional probability 


F (16, 


6490) = 


= 1.859, p < 0.05 


F (1 , 6490) : 


= 0.11, p = 0.74 


Biphone probability 


F (16, 


6490) = 


= 1.332, p = 0.17 






Age of acquisition 


F (16, 


5569) = 


= 0.94, p= 0.52 







IV Corrected F-tests were conducted using corrected degrees of freedom if Levene's test of homogeneity of variances was significant. (2) The linear trend post-hoc 
contrast was conducted only if the omnibus F-test was statistically significant. 



Word length 

Word length was measured by counting the number of 
phonemes in a given word. The 1-way AN OVA was signifi- 
cant, F (16, 261) = 35.12, p < 0.001, indicating that some com- 
munities contained mostly long words and other communi- 
ties contained mostly short words. The post-hoc linear contrast 
[F (1, 104) = 100.00, p < 0.001] and the correlation between 
mean length and community size were also significant (r = 
—0.653, df = 15, p < 0.01), indicating that larger communi- 
ties tend to consist of shorter words (i.e., the words contain 
fewer phonemes), whereas smaller communities tend to consist 
of longer words. 

Subjective familiarity 

Subjective familiarity values were obtained on a 7-point scale, 
such that words with higher familiarity scores were perceived to 
be more familiar (Nusbaum et al., 1984). The 1-way ANOVA 
was significant, F (16, 199) = 3.28, p < 0.001, indicating that 
some communities contained mostly familiar words and other 
communities contained mostly unfamiliar words. The post-hoc 
linear contrast [F (1, 124) = 16.39, p < 0.001] and the corre- 
lation between mean familiarity and community size were also 
significant (r = 0.560, df = 15, p < 0.05), indicating that larger 
communities tended to consist of highly familiar words, whereas 
smaller communities tended to consist of less familiar words. 

Word frequency 

Word frequency refers to how often a given word occurs in a lan- 
guage, and log-base 10 of the raw frequency counts from Kucera 
and Francis (1967) were used in the present analyses. The 1-way 
ANOVA was significant, F (16, 234) = 12.73, p < 0.001, indi- 
cating that some communities contained mostly high frequency 
words and other communities contained mostly low frequency 



words. The post-hoc linear contrast [F (1, 116) = 46.80, p < 
0.001] and the correlation between mean word frequency and 
community size were also significant (r = 0.753, df = 15, p < 
0.001), indicating that larger communities tended to consist 
of more frequent words, whereas smaller communities tend to 
consist of less frequent words. 

The finding that larger communities tend to consist of frequent 
and short words is reminiscent of Zipf's (1935) more general 
observation that short words tend to be also very frequent words 
in a language. Thus, the overall structure of the language appears 
to be reflected in the communities observed in the present study, 
much like the structure of a fractal is observed at both large and 
small scales. 

Furthermore, given Zipf's (1935) additional observation that 
there are few high frequency words and many low frequency 
words in a language, it is interesting to note that most of these 
high frequency words are found in the largest communities. These 
large communities consisting of high frequency words may reflect 
sections of the giant component where a large amount of cog- 
nitive processing occurs compared to other parts of the lexical 
network. 

Neighborhood density 

Neighborhood density refers to the number of words that are 
phonologically similar to a given word (Luce and Pisoni, 1998). 
Phonological similarity is defined as the substitution, addi- 
tion, or deletion of one phoneme in a given word to form a 
phonological neighbor. Note that this is identical to the cri- 
teria used to decide if two words in the network used in the 
present analysis should be connected by an edge or not, and is 
therefore equivalent to the network science term degree. The 1- 
way ANOVA was significant, F (16, 271) = 166.87, p < 0.001, 
indicating that some communities contained mostly words of 
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FIGURE 1 | Plots of mean lexical characteristics of each community against community sizes. The x-axis represents the number of words residing in each 
community. The y-axis represents the mean lexical characteristics for each of the 17 communities. The dashed line represents the best-fit line. 



high neighborhood density (or high degree) and other communi- 
ties contained mostly words of low neighborhood density (or low 
degree). 

The post-hoc linear contrast [F (1, 1666) = 1412.58, p < 
0.001] and the correlation between mean neighborhood density 
and community size were also significant (r = 0.802, df = 15, 
p < 0.001), indicating that larger communities tended to con- 
sist of high-density words (or nodes with high degree), whereas 
smaller communities tend to consist of low-density words (or 
nodes with low degree). This result is in line with the idea 
that communities are simply sub-graphs of the original network 
(Ravasz and Barabasi, 2003). The largest communities would be 



analogous to the giant component of a network, and smaller com- 
munities are analogous to islands. As the giant component is a 
very densely connected section of the network compared to the 
connectivity of disconnected components (islands), one might 
expect words in the larger communities to be of higher degree 
than words in the smaller communities. 

Neighborhood frequency 

Neighborhood frequency is the mean word frequency of a word's 
phonological neighbors. Log-base 10 values of word frequency 
counts were obtained from Kucera and Francis (1967). The 
1-way ANOVA was significant, F (16, 288) = 39.34, p < 0.001, 
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indicating that some communities contained mostly words with 
high frequency neighbors and other communities contained 
mostly words with low frequency neighbors. The post-hoc lin- 
ear contrast was also significant [jF(l, 113) = 158.59, p < 0.001] 
and the correlation between mean neighborhood frequency and 
community size were also significant (r = 0.741, df = 15, p < 
0.001), indicating that larger communities tended to consist of 
words with high frequency neighbors, whereas smaller commu- 
nities tended to consist of words with low frequency neighbors. 
Again, this is consistent with the previously mentioned finding 
that high frequency words tend to occur in large communities. 

Phonotactic probability 

The phonotactic probability of a word refers to the probability 
that a segment occurs in a certain position of a word (positional 
segment probability), and the probability that two adjacent seg- 
ments co-occur (biphone probability; Vitevitch and Luce, 1998). 
These values were obtained from the Phonotactic Probability 
Calculator 3 (Vitevitch and Luce, 2004). The 1-way ANOVA for 
positional probability was significant, F (16, 183) = 53.25, p < 
0.001, indicating that some communities contained words of high 
positional probability and other communities contained words of 
low positional probability. The post-hoc linear contrast was sig- 
nificant, F (1, 84.4) = 11.03, p < 0.01; however, the correlation 
between mean positional probability and community size was not 
significant, p = 0.32. 

The 1-way ANOVA for biphone probability was significant, 
F (16, 259) = 101.72, p < 0.001, indicating that some commu- 
nities contained words of high biphone probability and other 
communities contained words of low biphone probability. The 
post-hoc linear contrast was significant, F (1, 121) = 116.48, 
p < 0.001; however, the correlation between mean biphone 
probability and community size was marginally significant, 
p = 0.08. 

It is important to note that phonotactic probability on its own 
does not tell us the underlying phonological structure of each 
community, as phonotactic probability is a value that indicates 
the frequency of occurrence of a phoneme in a particular position 
(or the co-occurrence of two phonemes in the case of biphone 
probability) in a given language. It is possible for two communi- 
ties to have similar mean phonotactic probabilities, but different 
types or combinations of phonemes and biphones could have 
contributed to this value. Therefore, to investigate the phonologi- 
cal structure of communities, additional analyses on the biphone 
frequencies for each community were conducted (see section 
"Raw Biphone Counts"). 

Age of acquisition 

Age of acquisition refers to the age at which a particular word 
was learned (e.g., Ghyselinck et al., 2004). Age of acquisition 
ratings are typically obtained by asking participants to indicate 
the age at which a particular word was learned (e.g., Cortese 
and Khanna, 2008; Kuperman et al., 2012). Ratings for 5,568 
words were obtained from the Kuperman et al. (2012) megastudy. 



3 The Phonotactic Probability Calculator is available at http://www.people.ku. 
edu/~mvitevit/PhonoProbHome.html 



As ratings were not available for the other 940 words in the giant 
component, these words were not included in these analyses. 

The 1-way ANOVA for age of acquisition ratings was sig- 
nificant, F (16, 214) = 9.73, p < 0.001, indicating that some 
communities contained words with high age of acquisition ratings 
and other communities contained words with low age of acqui- 
sition ratings. The post-hoc linear contrast [F (1, 91.3) = 49.53, 
p < 0.001) and the correlation between mean age of acquisition 
ratings and community sizes were also significant (r = —0.739, 
df = 15, p < 0.001), indicating that larger communities tended 
to consist of words with low age of acquisition ratings, whereas 
smaller communities tended to consist of words with high age of 
acquisition ratings. Given the well-documented observation that 
high frequency words tend to be words that are also acquired ear- 
lier in life (Ghyselinck et al., 2004; Kuperman et al, 2012), as well 
as the finding that high frequency words tend to reside in larger 
communities, it is perhaps not surprising that larger communi- 
ties tend to consist of words with low age of acquisition ratings 
(i.e., acquired at a younger age). Nevertheless, this is a potentially 
important finding because it suggests that the larger communi- 
ties are formed earlier than smaller communities, and could have 
implications for understanding language acquisition and growth 
dynamics of a language network (Steyvers and Tenenbaum, 2005). 

RANDOM COMMUNITIES 

In summary, ANOVAs and post-hoc linear trend analyses for 
all lexical characteristics were significant for real communities. 
Turning to the random communities, only the ANOVAs for 
neighborhood density and positional probability were significant, 
both Fs < 1.86, both ps < 0.05, however, post-hoc linear trend 
analyses were not significant, both Fs < 2.50, both ps > 0.12. 

The absence of a significant linear trend for the random com- 
munities strongly suggests that the significant ANOVAs for neigh- 
borhood density and positional probability may be spurious. To 
assess this possibility, 4 additional sets of random communities 
were generated in the same manner and 1-way ANOVAs were 
conducted on them. None of the ANOVAs on the 4 new sets of 
randomly generated communities were significant 4 . 

The fact that most of the ANOVAs for the random commu- 
nities were not significant whereas all ANOVAs for the real com- 
munities were significant implies that the communities extracted 
by the community detection algorithm are not simply random 
groupings of words, but are capturing important relationships 
among words in the phonological network, such as the finding 
that larger communities tended to consist of shorter words of 



4 As suggested by a reviewer, an alternative baseline could be constructed 
by generating 100 sets of random communities and computing 95% confi- 
dence intervals of the lexical characteristics for each of the 17 communities to 
determine if the mean lexical characteristics of the real communities were sig- 
nificantly different from that of the random communities. The mean lexical 
characteristics for the 17 real communities did not fall within these confidence 
intervals [except for Community 1 1 (word frequency), Community 15 (neigh- 
borhood density) and Community 14 (age of acquisition)], which indicate 
that the pattern of results observed in the ANOVAs for the real network was 
not spurious and further support the results of the ANOVAs conducted on the 
5 sets of random communities. A table of the confidence intervals of the 100 
sets of random communities is included in Supplementary Materials. 
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high frequency and neighborhood density as compared to smaller 
communities. 

Note that the correlations between community sizes and com- 
munity means of lexical characteristics reported for the real 
communities are consistent with the observed patterns reported 
in previous work (e.g., Zipf, 1935; Landauer and Streeter, 1973; 
Frauenfelder et al, 1993). For instance, larger communities tend 
to contain shorter words and more frequent words, which is the 
same pattern obtained by Zipf 's (1935) analysis of the overall fre- 
quency of words in a language — words that frequently occur in 
corpora are also short words. This strongly suggests that these pat- 
terns may have implications for lexical processing and language 
evolution. 

RAW BIPHONE COUNTS 

As mentioned in the Introduction, Auer and Luce (2005) pointed 
out that current measures of phonotactic probability might not 
allow us to detect and hence assess the influence of longer phono- 
logical segments on lexical processing. To investigate whether 
communities consist of words which contain similar phonolog- 
ical segments, raw counts of biphones found in words belonging 
to the same community, henceforth referred to as raw biphone 
counts, were obtained from each of the 17 real and random com- 
munities. Note that these raw biphone counts represent a measure 
of how often a particular biphone occurs within each community. 
On the other hand, the positional and biphone probabilities that 
were analyzed in the ANOVAs above represent how often a par- 
ticular segment occurs at a certain word position and the overall 
probability of occurrence of those biphones within a corpus of 
words respectively (Vitevitch and Luce, 2004), and hence do not 
directly indicate whether similar phonological segments occur 
in the same community. It should be emphasized that the raw 
biphone counts obtained for each community are not position- 
specific and do not represent overall frequencies in a language, 
unlike commonly used phonotactic measures in the literature. 

Two-way Kolmogorov-Smirnov (K-S) tests were conducted to 
compare the raw biphone counts found in the real and ran- 
dom communities of the same size. The results are summarized 
in Table 4. All K-S tests were significant, Ds > 0.10, ps < 
0.05, except for Communities 11 and 17, and the K-S test for 
Community 16 was marginally significant, D = 0.09, p = 0.077. 
These results indicated that the raw biphone counts of com- 
munities obtained using community detection methods were 
significantly different from the raw biphone counts of randomly 
generated communities. 

Figures 2, 3 show the raw biphone counts from the real and 
random communities 1 and 15. In these figures the sequence 
of biphones on the x-axis is the same for both real and ran- 
dom communities, and arranged (in decreasing order) by their 
frequency in the real community. Two things are clear from 
the figures. One, random communities contain a large num- 
ber of different biphones compared to the real communities. 
Second, the raw counts of biphones found in real communities 
are much larger than the same biphones in random communi- 
ties. Taken together, this strongly suggests that communities in the 
phonological network consist of words with certain phonological 
segments. 



Table 4 | Summary of Kolmogorov-Smirnov tests for raw biphone 
counts of real and random communities. 



Community 


D-statistic 


p-value 


1 


0.366 


0.001** 


2 


0.288 


0.001** 


3 


0.287 


0.007** 


4 


0.296 


<0.001*** 


5 


0.309 


<0.001*** 


6 


0.175 


0.002** 


7 


0.174 


0.001** 


8 


0.136 


0.005** 


9 


0.149 


0.002** 


10 


0.107 


0.025* 


11 


0.077 


0.191 


12 


0.119 


0.004** 


13 


0.127 


0.004 


14 


0.066 


0.295 


15 


0.127 


0.002** 


16 


0.086 


0.077+ 


17 


0.071 


0.202 


***p < 0.007, **p < 0.07, 


*p< 0.05, +p< 0.70. 





From a visual inspection of Figures 2, 3, it is clear that certain 
biphones are overrepresented in the real communities compared 
to random communities. It is interesting to note that there are rel- 
atively few biphones that occur frequently within a community, 
and a large number of biphones that occur rarely. Furthermore, 
this pattern was observed in all 17 real communities, but not 
in randomly generated communities. This pattern is also rem- 
iniscent of Zipf's (1935) finding that within a language there 
are few words that occur at very high frequencies but many 



Raw biphone counts of real community 1 




Biphones 

Raw biphone counts of random community 1 
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FIGURE 2 | Raw biphone counts of real and random community 1. The 

x-axis represents the different biphones found within these communities 
and the biphones (on both x-axes) were arranged based on their frequency 
of occurrence in the real community in descending order. 
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FIGURE 3 | Raw biphone counts of real and random community 15. 

The x-axis represents the different biphones found within these 
communities and the biphones (on both x-axes) were arranged based 
on their frequency of occurrence in the real community in descending 
order. 



words that occur less frequently. Despite the fact that the biphone 
distributions of each community consist of different biphones, it 
appears that biphone distributions at the community level mir- 
ror the overall frequency-of-occurrence pattern of words in a 
language. 

Strikingly, the most frequent biphone pairs in real communi- 
ties can be concatenated to form longer phonological segments. 
For instance, in community 1, the most frequently occurring 
biphones are /in/, /bi/ and /3b/, which can be concatenated to 
form a longer phonological segment, /3bin/ ("urban") that is then 
found in other words in that community, such as urban, turbine 
and bourbon. Similarly, in community 15, the most frequently 
occurring biphones are /5k/, /IB/ and lx\l, which can be concate- 
nated to form a longer phonological segment, /jIBk/ ("rink") that 
is then found in other words in that community, such as brink, 
drink and wrinkle. Thus, a large proportion of words in these 
communities contain these particular phonological segments and 
words in a community may simply be phonological variants of 
each other to varying degrees. 

DISCUSSION 

Using the Louvain method in Gephi, 17 communities were 
extracted from the giant component of the phonological network 
in Vitevitch (2008). Modularity, Q, was 0.655, which is much 
higher than the modularity values of the random ER networks 
and indicates the presence of strong community structure in the 
phonological network. 

Additional analyses were conducted for both real and random 
communities to compare various lexical characteristics of words 
in different communities. Generally, ANOVAs and post-hoc lin- 
ear trend analyses were significant for real communities but not 
for random communities. This indicated the presence of mean 



differences in lexical characteristics of words belonging to differ- 
ent communities, and linear contrasts suggested that the pattern 
of these differences was related to the size of the community. The 
results of these analyses were consistent with the prediction that 
larger communities tended to consist of short, frequent words 
of high degree, whereas smaller communities tended to consist 
of longer, less frequent words of low degree. Although ANOVAs 
were conducted separately for each lexical variable, these vari- 
ables are by no means independent of each other; in fact, the 
present findings are consistent with previously observed patterns 
of correlations between various lexical variables (e.g., Zipf, 1935; 
Frauenfelder et al, 1993). 

Raw biphone counts of real and random communities were 
also obtained and compared to uncover underlying patterns of 
phonological segments present in the extracted communities. 
K-S tests comparing real and random communities were signif- 
icant for the majority of the 17 communities, indicating that 
the number of different phonological segments, as well as their 
raw counts (which represent the number of occurrences of that 
segment within a community), found in real communities were 
significantly different from that of random communities. The 
pattern that there are relatively few biphones that occur very fre- 
quently within communities and a large number of biphones 
that occur rarely is reminiscent of the pattern of the overall 
frequency of words in a language (Zipf, 1935). In addition, it 
should be highlighted that communities do not appear to be 
organized exclusively by cohorts or rimes, linguistic constructs 
that are commonly studied in the psycholinguistic literature (e.g., 
Marslen-Wilson, 1987; Norris et al, 2002). It is interesting to note 
that these linguistic constructs are not explicit features that are 
"build into" the organization of the phonological network. 

Psycholinguistic research has traditionally focused on study- 
ing how various lexical characteristics of individual words, such 
as word frequency and neighborhood density, influence vari- 
ous aspects of lexical processing (e.g., Luce and Pisoni, 1998). 
Indeed, the existing literature has shown that these micro-level 
characteristics exert measurable and robust effects on spoken 
word recognition and production (e.g., Savin, 1963; Luce and 
Pisoni, 1998; Vitevitch and Luce, 1999). On the other hand, recent 
applications of network science to the study of both seman- 
tic and phonological language networks have revealed that the 
macro-level characteristics, such as average path length, average 
clustering coefficient and degree distribution, of these networks 
resemble that of other complex networks (e.g., Steyvers and 
Tenenbaum, 2005; Vitevitch, 2008). 

Although both of these approaches have revealed important 
aspects about the micro- and macro-level of the phonological 
network, the present approach of applying a community detec- 
tion method to this network has exposed the presence of robust 
community structure in the phonological network. Community 
structure can be viewed as the meso-level of the network, which 
describes the connectivity of the network at an intermediate level, 
rather than at the level of individual nodes or at the level of the 
entire network. As most complex networks comprise of a hier- 
archy of larger and smaller communities (Ravasz and Barabasi, 
2003), there exist different layers and levels of connectivity within 
the network that includes the more frequently studied micro- and 
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macro-levels. Below I provide some examples of how the present 
findings with respect to the meso-level of the phonological net- 
work can provide deeper insights into lexical processing and 
language acquisition, beyond that of traditional psycholinguis- 
tic variables, as well as implications for the evolution of natural 
language. 

IMPLICATIONS FOR LEXICAL PROCESSING 

Community structure of a network can inform the dynamics of 
the spread of information within various networks (Lancichinetti 
et al., 2010; Kitchovitch and Lio, 2011; Wu et al., 2011). Similarly, 
the presence of community structure in the phonological network 
may be useful in explaining the dynamics of the spread of acti- 
vation among words and how these dynamics influence lexical 
retrieval. 

According to the spreading activation mechanism described in 
Chan and Vitevitch (2009; 2010; see also Vitevitch et al, 2011), 
when a word is activated, activation spreads to phonological 
neighbors of that word, and activation can also spread from these 
phonological neighbors back to the word that was initially acti- 
vated. Such a mechanism has been used to explain why words 
with high clustering coefficients are more slowly recognized than 
words with low clustering coefficients. As activation becomes 
trapped within a densely connected local structure, it is diffi- 
cult for the word with high clustering coefficient to "stand out" 
among other phonologically similar words and be subsequently 
recognized (Chan and Vitevitch, 2009). 

With respect to community structure, it is possible that acti- 
vation tends to be trapped within a community via the same 
mechanism described above, especially as words within commu- 
nities are, by definition, more densely connected to each other 
than to words of other communities. There are some potential 
implications for lexical access. 

If one conceptualizes lexical retrieval as a search problem 
within long-term memory, analogous to searching for a "patch" 
(a cluster of items) in memory to retrieve a target item from 
(Hills et al, 2012), then higher activation levels of words within 
one community compared to other communities in the network 
can facilitate lexical retrieval by narrowing the search space of the 
entire network to a smaller community. In fact, this could be a 
possible explanation of the observed phonotactic effects in speech 
perception and production — since communities tend to consist 
of words that share similar phonological segments, recognition of 
a target word that shares these same segments could be facilitated 
because of the higher overall activation levels of the commu- 
nity that the target word belongs to. On the other hand, words 
that contain segments of low phonotactic probability may not be 
recognized as quickly because other words that share those less 
common segments do not constitute a robust community within 
the network. 

Conceiving phonotactic effects as an emergent property of 
the network's community structure could resolve the contra- 
diction observed between facilitatory phonotactic effects and 
inhibitory neighborhood density effects in spoken word recogni- 
tion. As mentioned earlier, these effects are contradictory because 
words that belong to dense neighborhoods also tend to con- 
tain common phonological segments. It is possible, however, 



that these effects arise at different levels or resolutions of the 
network — community structure at the meso-level reflects the 
grouping of words depending on their phonological segments, 
whereas neighborhood density reflects the degree of a word, 
or the number of phonological neighbors, which is a micro- 
level feature that captures the nature of a word's local net- 
work structure. This distinction between different resolutions of 
the network is akin to the framework of sublexical and lexical 
types of representation proposed by Vitevitch and Luce (1999), 
who suggested that different experimental tasks might empha- 
size the processing of either sublexical or lexical representations. 
An alternative but analogous way of understanding the dis- 
crete effects of phonotactic probability and neighborhood density 
could involve specifying the "resolution" of the lexical processes 
elicited by the experimental task, which could indicate the level 
of the phonological network that is emphasized during process- 
ing. The present finding that phonotactic probability "emerged" 
from the meso-level organization of lexical forms in the phono- 
logical network strongly suggests that phonotactic probability 
and density are not distinct, disparate features of phonological 
word forms; rather, the effects of phonotactic probability and 
neighborhood density arise depending on which level, or res- 
olution, of the network that is emphasized by the processing 
task. 

Another particularly striking finding that is worth noting is the 
finding that larger communities tend to consist of shorter words 
of high frequency and high density. The fact that high frequency 
words tend to reside in large communities implies that a large pro- 
portion of activation is primarily occurring at a particular region 
of the network because these words are frequently activated and 
retrieved. Since these high frequency words also tend to be of high 
neighborhood density (i.e., degree), this further implies that a 
substantial proportion of activation is trapped within this region 
as several other phonological neighbors also compete for recogni- 
tion. It is interesting to note that these high frequency words are 
grouped together in large communities despite there being rel- 
atively few high frequency words as compared to low frequency 
words within the lexicon, in accordance with Zipf's (1935) law. 
Based on the spreading activation mechanism proposed by Chan 
and Vitevitch (2009), such an organization does not appear to be 
very efficient for lexical processing as it is difficult for any single 
word to "stand out" from other words that belong to the same 
community. On the other hand, words which belong to larger 
communities are also acquired relatively early in life. It is possi- 
ble that acquiring certain words earlier affords them a processing 
advantage, for example, by making their meaning easily acces- 
sible (Brysbaert et al., 2000), so that these words can be easily 
retrieved in spite of their network structure. These are interesting 
and important hypotheses that deserve to be studied in greater 
detail. 

Finally, the presence of larger phonological sequences that con- 
sist of more than just two phonemes lends some credence to 
the hypothesis put forward by Auer and Luce (2005) who spec- 
ulated that such sequences could influence speech perception 
and production. Stimuli could be selected from the communities 
observed in the present analyses to empirically test the hypothesis 
put forward by Auer and Luce (2005). 
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IMPLICATIONS FOR LANGUAGE ACQUISITION 

Although the present work has implications for other aspects 
of lexical processing such as understanding speech errors and 
learning of new words or a second language, it is beyond the 
scope of this paper to discuss the implications for all of these 
areas. Here I discuss a second area of lexical processing in which 
the present findings could have important implications for — 
language acquisition. 

Community structure could potentially contribute toward our 
understanding of how language networks grow and change over 
time. Hills et al. (2009, 2010) have shown that the structure of 
the learning environment plays an important role in language 
acquisition. In addition, Hills et al. (2010) report that phonolog- 
ical neighbors play a role in predicting the order of acquisition 
of nouns. Furthermore, recent work by Beckage et al. (2011) 
found that the tendency for late talkers to acquire semantically 
novel words relative to known words could have resulted in the 
less "small world-like" structure of late talkers' early semantic 
networks compared to their typically developing peers. 

Based on the above research findings, it is plausible that the 
early stages of language acquisition are crucial in the development 
of robust communities which ultimately promotes the growth 
of a cohesive language network that allows for rapid and effi- 
cient lexical processing to occur. The present finding that larger 
communities tend to consist of words that are learned earlier in 
life further suggests that larger communities may be among the 
first to develop in language acquisition, and may be a precur- 
sor for developing a robust language network. In addition, the 
finding that words containing similar phonological segments tend 
to belong to the same communities could lead to theoretically 
motivated predictions about the phonological properties of words 
that children tend to learn first, and how these might be differ- 
ent for children with language disorders or learning impairments. 
For instance, words that share similar phonological segments may 
tend to be acquired at about the same time in order to form the 
foundation of a new community within the phonological net- 
work. This is not inconceivable given that songs, limericks and 
nursery rhymes are important features of a child's early language 
learning environment. This could possibly further motivate the 
design of language learning programs or protocols that dictate 
which words should be acquired first (based on their phono- 
logical properties and community membership), which could 
help children with language or learning disorders establish robust 
community structure and subsequently a robust language net- 
work. It should be noted that these are speculations on my part, 
and additional research is required to address these interesting 
hypotheses. 

IMPLICATIONS FOR LANGUAGE EVOLUTION 

As mentioned in the Introduction, communities are of special 
interest to network scientists because they are said to be signatures 
of naturally evolved real networks (Clune et al., 2013; Ravasz and 
Barabasi, 2003). In particular, it has been shown computation- 
ally that modularity and communities arise naturally in a network 
when evolutionary processes take into account the cost of creating 
new connections (Clune et al., 2013). With respect to language 
evolution I suggest that there are two types of costs involved in the 



creation of new words; the first refers to phonotactic constraints 
and the second refers to communicative constraints of a speaker 
and a listener. 

All languages are known to exhibit a property known as com- 
binatorial phonology, where meaningless units (phonemes) can 
be combined to form meaningful units (morphemes) (Hockett, 
1960). This property is important for language evolution because 
it results in the creation of all words of a language from simple 
combinations of a small number of phonemes (Hockett, 1960; 
Tria et al, 2012). This small number of phonemes relative to 
the large and possibly infinite number of words that could exist 
in a given language implies that phonemes were combined with 
other phonemes to form longer segments known as morphemes 
so that an unlimited number of referents (represented by different 
words) could be communicated between people without incur- 
ring excessive cognitive and memory cost (Nowak et al., 1999; 
Zuidema and De Boer, 2009). However, the combinatorial nature 
of phonology does not imply that all phoneme combinations are 
possible. 

In English, not all phoneme combinations are legal, repre- 
senting a phonotactic constraint on the creation of new words 
because of all possible words that could be formed by combining 
different phonemes, only a subset of those (which obey phono- 
tactic rules) would constitute viable candidates for a "new" word. 
This is consistent with the finding that nonwords containing 
phonological segments of high phonotactic probability tend to 
be rated as very "wordlike," i.e., these nonwords are highly pos- 
sible words in a language (Frisch et al, 2000; Bailey and Hahn, 
2001). In the present study, groups of similar phonological seg- 
ments were observed within communities that were extracted 
from the phonological network. These segments might constitute 
the morphemes that are the result of the combination of differ- 
ent phonemes, which could represent the first important step in 
language evolution — combinatorial phonology. The presence of 
community structure in the phonological network, where words 
belonging to the same community share similar phonological seg- 
ments and are essentially variants of each other, supports the 
findings of previous research on the emergence of morphology 
from phonology (Hockett, 1960; Tria et al., 2012). 

The way in which language has evolved also needs to take into 
account the communicative constraints that arise when people 
communicate with each other. An "ideal" language would be one 
that consists of words which are very phonologically distinct from 
all other known words as this would minimize communication 
errors — although it may reduce production efficiency because a 
speaker has to be able to articulate a wide variety of different 
phonological sequences. However, real languages tend to con- 
tain several words that are phonologically very similar to known 
words (e.g., Landauer and Streeter, 1973; Frauenfelder et al., 
1993) and, at least in English, these phonologically similar words 
are structurally organized within the community structure of the 
phonological network. From the perspective of a listener, the exis- 
tence of several phonologically similar words may result in more 
errors of lexical retrieval or at least reduce processing efficiency 
(Frauenfelder et al, 1993), as these words are connected to many 
other words and compete for recognition within the phonological 
network. Nevertheless, it is possible that community structure 
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within the language network provides some form of scaffold- 
ing for lexical processing, thereby increasing efficiency of lexical 
retrieval. This may be especially important if each of these phono- 
logically similar words maps onto a different semantic referent, as 
in English. In comparison, greater morphological similarity exists 
among Spanish words compared to English words (Arbesman 
et al, 2010a), which could explain the finding that phonologi- 
cally similar Spanish words are recognized more quickly than less 
phonologically similar Spanish words (Vitevitch and Rodriguez, 
2005), compared to the inhibitory neighborhood density effect in 
English words. 

Therefore, the presence of community structure in the phono- 
logical network may constitute preliminary evidence for the 
hypothesis that a language network evolves in a way that takes into 
account the competing needs of the listener and speaker, that also 
strikes a balance between polysemy (differentiation of meanings) 
and phonological similarity of words in a given language (Ferrer i 
Cancho and Sole, 2003). 

CONCLUSIONS 

In the present paper, community detection methods revealed the 
presence of communities in the phonological network, and also 
uncovered novel aspects of the phonological network, such as 
(1) the finding that larger communities consist of short, fre- 
quent words of high degree and low age of acquisition ratings 
whereas smaller communities consist of longer, less frequent 
words of low degree and high age of acquisition ratings, (2) the 
similarity of the pattern of biphone distributions within com- 
munities to the pattern of frequency-of-occurrence of words in 
a language, and (3) the clustering of similar phonological seg- 
ments in each community. These novel findings were observed 
using a community detection method that considers the struc- 
ture of a network at an intermediate level, rather than at a purely 
global or local level. Therefore, even though similar relationships 
between the lexical characteristics of words have been found in 
previous studies (e.g., Zipf, 1935; Landauer and Streeter, 1973; 



Frauenfelder et al, 1993), the present findings are still significant 
because they relate to the mesoscopic level of the network, whereas 
the patterns found in prior work were of the overall relationship 
of words in the lexicon. 

These findings also have important implications with respect 
to understanding the dynamics of the spread of activation within 
the phonological network, language acquisition, as well as the 
nature of language evolution. In particular, the presence of com- 
munity structure within the phonological network could be said 
to be a "signature" of language evolution, which could further 
indicate the different ways in which language could have evolved, 
in order to take into account articulation costs to maximize com- 
municative efficiency, or to allow for the emergence of morphol- 
ogy from phonology. Although these conclusions are admittedly 
somewhat speculative in nature, the present findings are signifi- 
cant because they not only show that community structure exists 
within the phonological network, but also more importantly that 
this community structure reflects the grouping of phonological 
word forms with similar lexical characteristics and contain simi- 
lar phonological segments at a mesoscopic level. Future research 
can be directed toward investigating these intriguing specula- 
tions in greater detail. Another potential avenue of research could 
involve comparative analyses across languages to determine if 
these meso-level properties are also observed in other languages. 
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